Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
July 10, 2025
Thursday
Previous: two groups, continuous outcome
Now: more than two groups, continuous outcome
One-way ANOVA
Kruskal-Wallis
We have previously discussed testing the difference between two groups.
We will use a method called analysis of variance (ANOVA).
Fun fact: the two-sample t-test is a special case of ANOVA.
Fun fact: the two-sample t-test is a special case of ANOVA.
Two-sample t-test:
Two-sample t-test for two independent means and equal variance:
Null: H₀: μ₁ − μ₂ = 0
Alternative: H₁: μ₁ − μ₂ ≠ 0
Test statistic: t(98) = 0.52
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
One-Way ANOVA:
H₀: μ_A = μ_B
H₁: At least one group mean is different
Test Statistic: F(1, 98) = 0.271
p-value: p = 0.604
Conclusion: Fail to reject the null hypothesis (p = 0.6041 ≥ α = 0.05)
The computations for ANOVA are more involved than what we’ve seen before.
An ANOVA table will be constructed in order to perform the hypothesis test.
| Source | Sum of Squares | df | Mean Squares | F |
|---|---|---|---|---|
| Treatment | SSTrt | dfTrt | MSTrt | F0 |
| Error | SSE | dfE | MSE | |
| Total | SSTot | dfTot |
Once this is put together, we can perform the hypothesis test.
\bar{x}, \ \ n_i, \ \ \bar{x}_i, \ \ s_i^2
\begin{align*} \text{SS}_{\text{Trt}} &= \sum_{i=1}^k n_i(\bar{x}_i-\bar{x})^2 \\ \text{SS}_{\text{E}} &= \sum_{i=1}^k (n_i-1)s_i^2 \\ \text{SS}_{\text{Tot}} &= \text{SS}_{\text{Trt}} + \text{SS}_{\text{E}} \end{align*}
\begin{align*} \text{df}_{\text{Trt}} &= k-1\\ \text{df}_{\text{E}} &= n-k\\ \text{df}_{\text{Tot}} &= n-1 \end{align*}
Once we have the sum of squares and corresponding degrees of freedom, we can compute the mean squares.
In the case of one-way ANOVA, \begin{align*} \text{MS}_{\text{Trt}} &= \frac{\text{SS}_{\text{Trt}}}{\text{df}_{\text{Trt}}} \\ \text{MS}_{\text{E}} &= \frac{\text{SS}_{\text{E}}}{\text{df}_{\text{E}}} \end{align*}
\text{MS}_X = \frac{\text{SS}_X}{\text{SS}_E}
Finally, we have the test statistic.
Generally, we construct an F for ANOVA by dividing the MS of interest by MS_E_,
F_X = \frac{\text{MS}_X}{\text{MS}_{\text{E}}}
F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}
| Source | Sum of Squares | df | Mean Squares | F |
|---|---|---|---|---|
| Treatment | SSTrt | dfTrt | MSTrt | F0 |
| Error | SSE | dfE | MSE | |
| Total | SSTot | dfTot |
one_way_ANOVA_table() function from library(ssstats) to construct the ANOVA table.In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s first create the ANOVA table. How should we update this code?
In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types.
Let’s first create the ANOVA table. Our updated code:
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types.
Running the code,
In one-way ANOVA, hypotheses always take the same form:
Note 1: you must fill in the “k” when writing hypotheses!
e.g., if there are four means, your hypotheses are
e.g., in our MLP example,
Test statistic:
F_0 = \frac{\text{MS}_{\text{Trt}}}{\text{MS}_{\text{E}}}
p-Value:
p = P[F_{k-1,n-k} \ge F_0]
one_way_ANOVA() function from library(ssstats) to construct the ANOVA table.In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. How should we update this code?
In the magical land of Equestria, ponies come in different types: Unicorns, Pegasi, Earth Ponies, and Alicorns. While each group has unique abilities, some researchers in Twilight Sparkle’s lab are curious whether these pony types differ in their overall magic ability scores, a standardized measure that combines magical potential, control, and versatility.
To investigate this, they collect data (magical_studies) on a random sample of ponies from each pony type (pony_type). The researchers want to know if there is difference in magic ability scores (magic_abiity_score) among the four pony types. We will test at the \alpha=0.05 level.
Let’s now formulate the hypothesis test. Our updated code,
Today we have introduced ANOVA. Recall the hypotheses,
The F test does not tell us which mean is different… only that a difference exists.
In theory, we could perform repeated t tests to determine pairwise differences.
Recall that ANOVA is an extension of the t test… or that the t test is a special case of ANOVA.
However, this will increase the Type I error rate (\alpha).
Recall that the Type I error rate, \alpha, is the probability of incorrectly rejecting H_0.
Suppose we are comparing 5 groups.
This is 10 pairwise comparisons!!
If we perform repeated t tests under \alpha=0.05, we are inflating the Type I error to 0.40! 😵
When performing posthoc comparisons, we can choose one of two paths:
Note that controlling the Type I error rate is more conservative than when we do not control it.
Generally, statisticians:
do not control the Type I error rate if examining the results of pilot/preliminary studies that are exploring for general relationships.
do control the Type I error rate if examining the results of confirmatory studies and are attempting to confirm relationships observed in pilot/preliminary studies.
The posthoc tests we will learn:
Tukey’s test
Fisher’s least significant difference
Dunnett’s test
Caution: we should only perform posthoc tests if we have determined that a general difference exists!
Error in eval(predvars, data, env): object 'strength' not found
Error: object 'm1' not found
Tukey’s test allows us to do all pairwise comparisons while controlling \alpha.
The underlying idea of the comparison:
We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge W, where W = \frac{q_{\alpha}(k, \text{df}_{\text{E}})}{\sqrt{2}} \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}
We will use the TukeyHSD() function.
aov() function.Error in eval(predvars, data, env): object 'strength' not found
Error: object 'm' not found
Fisher’s allows us to test all pairwise comparisons but control the \alpha.
The underlying idea of the comparison:
We will use the LSD.test() function from the agricolae package.
aov() function.library(agricolae)
results <- summary(m)
(LSD.test(dataset_name$continuous_variable, # continuous outcome
dataset_name$grouping_variable, # grouping variable
results[[1]]$Df[2], # df_E
results[[1]]$`Mean Sq`[2], # MSE
alpha = alpha_level) # can omit if alpha = 0.05
)[5] # limit to only the pairwise comparison resultsError: object 'm' not found
LSD.test(data$strength,
data$system,
results[[1]]$Df[2],
results[[1]]$`Mean Sq`[2],
alpha = 0.01)[5]Error in `[.data.frame`(junto, , 1): undefined columns selected
Dunnett’s test allows us to do all pairwise comparisons against only the control, while controlling \alpha.
This has fewer comparisons than Tukey’s because we are not comparing non-control groups to one another.
i.e., we are sharing the \alpha between fewer comparisons now, which is preferred if we are not interested in the comparisons between non-control groups.
The underlying idea of the comparison:
We declare \mu_i \ne \mu_j if |\bar{y}_i - \bar{y}_j| \ge D, where D = d_{\alpha}(k-1, \text{df}_{\text{E}}) \sqrt{\text{MSE} \left( \frac{1}{n_i} + \frac{1}{n_c} \right)},
DunnettTest() function from the DescTools package to perform Dunnett’s test.Let’s apply Dunnett’s to the dental data.
Error in complete.cases(x, g): no input has determined the number of cases
We previously discussed testing three or more means using ANOVA.
We also discussed that ANOVA is an extension of the two-sample t-test.
Recall that the t-test has two assumptions:
Equal variance between groups.
Normal distribution.
We will extend our knowledge of checking assumptions today.
y_{ij} = \mu + \tau_i + \varepsilon_{ij}
where:
We assume that the error term follows a normal distribution with mean 0 and a constant variance, \sigma^2. i.e., \varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
Very important note: the assumption is on the error term and NOT on the outcome!
We will use the residual (the difference between the observed value and the predicted value) to assess assumptions: e_{ij} = y_{ij} - \hat{y}_{ij}
Normality: quantile-quantile plot
Variance: scatterplot of the residuals against the predicted values
Like with t-tests, we will assess these assumptions graphically.
We will return to the classpackage package and use the anova_check() function.
library(tidyverse)
strength <- c(15.4, 12.9, 17.2, 16.6, 19.3,
17.2, 14.3, 17.6, 21.6, 17.5,
5.5, 7.7, 12.2, 11.4, 16.4,
11.0, 12.4, 13.5, 8.9, 8.1)
system <- c(rep("Cojet",5), rep("Silistor",5), rep("Cimara",5), rep("Ceramic",5))
data <- tibble(system, strength)
m <- aov(strength ~ system, data = data)
summary(m) Df Sum Sq Mean Sq F value Pr(>F)
system 3 200.0 66.66 7.545 0.00229 **
Residuals 16 141.4 8.84
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can formally check the variance assumption with the Brown-Forsythe-Levine test.
The test statistic is calculated as follows, F_0 = \frac{\sum_{i=1}^k n_i (\bar{z}_i - \bar{z})^2/(k-1)}{\sum_{i=1}^k \sum_{j=1}^{n_j}(z_{ij}-\bar{z}_i)^2/(n-k) }, where
Hypotheses
Test Statistic
p-Value
Rejection Region
We will use the leveneTest() function from the car package.
car package because it overwrites a necessary function in tidyverse.Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
\varepsilon_{ij} \overset{\text{iid}}{\sim} N(0, \sigma^2)
We also discussed how to assess the assumptions:
Graphically using the anova_check() function.
Confirming the variance assumption using the BFL.
If we break an assumption, we will turn to the nonparametric alternative, the Kruskal-Wallis.
If we break ANOVA assumptions, we should implement the nonparametric version, the Kruskal-Wallis.
The Kruskal-Wallis test determines if k independent samples come from populations with the same distribution.
Our new hypotheses are
\chi^2_0 = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1),
where
H follows a \chi^2 distribution with k-1 degrees of freedom.
Hypotheses
Test Statistic
p-Value
Rejection Region
kruskal.test() function to perform the Kruskal-Wallis test.Hypotheses
Test Statistic and p-Value
Rejection Region
Conclusion/Interpretation
We can also perform posthoc testing in the Kruskal-Wallis setting.
The set up is just like Tukey’s – we can perform all pairwise comparisons and control for the Type I error rate.
Instead of using |\bar{y}_i - \bar{y}_j|, we will use |\bar{R}_i - \bar{R}_j|, where \bar{R}_i is the average rank of group i.
The comparison we are making:
kruskalmc() function from the pgirmess package to perform the Kruskal-Wallis post-hoc test.Multiple comparison test after Kruskal-Wallis
alpha: 0.05
Comparisons
obs.dif critical.dif stat.signif
Ceramic-Cimara 0.2 9.871455 FALSE
Ceramic-Cojet 7.9 9.871455 FALSE
Ceramic-Silistor 10.3 9.871455 TRUE
Cimara-Cojet 8.1 9.871455 FALSE
Cimara-Silistor 10.5 9.871455 TRUE
Cojet-Silistor 2.4 9.871455 FALSE
Today we have talked about assessing ANOVA assumptions and performing the nonparametric alternative, the Kruskal-Wallis.
Per usual, we should only look at posthoc testing when we’ve detected an overall difference with the Kruskal-Wallis.
Next lecture: two-way ANOVA.
STA4173 - Biostatistics - Summer 2025